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Abstract — We consider the problem of recovering a function 
over the space of permutations (or, the symmetric group) over n 
elements from given partial information; the partial information 
we consider is related to the group theoretic Fourier Transform 
of the function. This problem naturally arises in several settings 
such as ranked elections, multi-object tracking, ranking systems, 
and recommendation systems. Inspired by the work of Donoho 
and Stark in the context of discrete-time functions, we focus 
on non-negative functions with a sparse support (support size 
^ domain size). Our recovery method is based on finding the 
sparsest solution (through £o optimization) that is consistent with 
the available information. As the main result, we derive sufficient 
conditions for functions that can be recovered exactly from 
partial information through Eo optimization. Under a natural 
random model for the generation of functions, we quantify the 
recoverability conditions by deriving bounds on the sparsity 
(support size) for which the function satisfies the sufficient 
conditions with a high probability as n — >^ oo. £o optimization 
is computationally hard. Therefore, the popular compressive 
sensing literature considers solving the convex relaxation, £i 
optimization, to find the sparsest solution. However, we show 
that £i optimization fails to recover a function (even with 
constant sparsity) generated using the random model with a high 
probability as n oo. In order to overcome this problem, we 
propose a novel iterative algorithm for the recovery of functions 
that satisfy the sufficient conditions. Finally, using an Information 
Theoretic framework, we study necessary conditions for exact 
recovery to be possible. 

Index Terms — Compressive sensing, Fourier analysis over sym- 
metric group, functions over permutations, sparsest-fit. 



I. Introduction 

FUNCTIONS over permutations serve as rich tools for 
modeling uncertainty in several important practical ap- 
plications; they correspond to a general model class, where 
each model has a factorial number of parameters. However, 
in many practical applications, only partial information is 
available about the underlying functions; this is because either 
the problem setting naturally makes only partial information 
available, or memory constraints allow only partial information 
to be maintained as opposed to the entire function - which 
requires storing a factorial number of parameters in general. 
In either case, the following important question arises: which 
"types" of functions can be recovered from access to only 
partial information? Intuitively, one expects a characterization 
that relates the "complexity" of the functions that can be 
recovered to the "amount" of partial information one has 
access to. One of the main goals of this paper is to for- 
malize this statement. More specifically, this paper considers 
the problem of exact recovery of a function over the space 
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of permutations given only partial information. When the 
function is a probability distribution, the partial information 
we consider can be thought of as lower-order marginals; 
more generally, the types of partial information we consider 
are related to the group theoretic Fourier Transform of the 
function, which provides a general way yo represent varying 
"amounts" of partial information. In this context, our goal is 
to (a) characterize a class of functions that can be recovered 
exactly from the given partial information, and (b) design a 
procedure for their recovery. We restrict ourselves to non- 
negative functions, which span many of the useful practical 
applications. Due to the generality of the setting we consider, 
a thorough understanding of this problem impacts a wide- 
ranging set of applications. Before we present the precise 
problem formulation and give an overview of our approach, 
we provide below a few motivating applications that can be 
modeled effectively using functions over permutations. 

A popular application where functions over permutations 
naturally arise is the problem of rank aggregation. This 
problem arises in various contexts. The classical setting is 
that of ranked election, which has been studied in the area 
of Social Choice Theory for the past several decades. In 
the ranked election problem, the goal is to determine a 
"socially preferred" ranking of n candidates contesting an 
election using the individual preference lists (permutations 
of candidates) of the voters. Since the "socially preferred" 
outcome should be independent of the identities of voters, the 
available information can be summarized as a function over 
permutations that maps each permutation a to the fraction 
of voters that have the preference list a. While described in 
the context of elections, the ranked election setting is more 
general and also applies to aggregating through polls the 
population preferences on global issues, movies, movie stars, 
etc. Similarly, rank aggregation has also been studied in the 
context of aggregating webpage rankings |2|, where one has 
to aggregate rankings over a large number of webpages. Bulk 
of the work done on the ranked election problem deals with 
the question of aggregation given access to the entire function 
over permutations that summarizes population preferences. In 
many practical settings, however, determining the function 
itself is non-trivial - even for reasonable small values of n. 
Like in the setting of polling, one typically can gather only par- 
tial information about population preferences. Therefore, our 
ability to recover functions over permutations from available 
partial information impacts our ability to aggregate rankings. 
Interestingly, in the context of ranked election, Diaconis |3J 
showed through spectral analysis that a partial set of Fourier 
coefficients of the function possesses "rich" information about 
the underlying function. This hints to the possibility that, 
in relevant applications, limited partial information can still 
capture a lot of structure of the underlying function. 
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Another important problem, which has received a lot of 
attention recently, is the Identity Management Problem or 
the Multi-object tracking problem. This problem is motivated 
by applications in air traffic control and sensor networks, 
where the goal is to track the identities of n objects from 
noisy measurements of identities and positions. Specifically, 
consider an area with sensors deployed that can identify 
the unique signature and the position associated with each 
object when it passes close to it. Let the objects be labeled 
1,2, ... ,n and let x{t) = {xi{t), X2{t), . . . , x„(i)) denote the 
vector of positions of the n objects at time t. Whenever a 
sensor registers the signature of an object the vector x{t) is 
updated. A problem, however, arises when two objects, say 
pass close to a sensor simultaneously. Because the sensors 
are inexpensive, they tend to confuse the signatures of the 
two objects; thus, after the two objects pass, the sensor has 
information about the positions of the objects, but it only 
has beliefs about which position belongs to which object. 
This problem is typically modeled as a probability distribution 
over permutations, where, given a position vector x{t), a 
permutation a of 1,2, ... ,n describes the assignment of the 
positions to objects. Because the measurements are noisy, to 
each position vector x{t), we assign, not a single permutation, 
but a distribution over permutations. Since we now have a 
distribution over permutations, the factorial blow-up makes it 
challenging to maintain it. Thus, it is often approximated using 
a partial set of Fourier coefficients. Recent work by [|4l, 15] 
deals with updating the distribution with new observations in 
the Fourier domain. In order to obtain the final beliefs one has 
to recover the distribution over permutations from a partial set 
of Fourier coefficients. 

Finally, consider the task of coming up with rankings for 
teams in a sports league, for example, the "Formula-one" car 
racing or American football, given the outcomes of various 
games. In this context, one approach is to model the final 
ranking of the teams using, not just one permutation, but a dis- 
tribution over permutations. A similar approach has been taken 
in ranking players in online games (cf. Microsoft's TrueSkill 
solution |6|), where the authors, instead of maintaining scores, 
maintain a distribution over scores for each player In this 
context, clearly, we can gather only partial information and 
the goal is to fit a model to this partial information. Similar 
questions arise in recommendation systems in cases where 
rankings, instead of ratings, are available or are preferred. 

In summary, all the examples discussed above relate to 
inferring a function over permutations using partial informa- 
tion. To fix ideas, let Sn denote the permutation group of 
order n and / : S'„ — > M+ denote a non-negative function 
defined over the permutations. We assume we have access to 
partial information about /(•) that, as discussed subsequently, 
corresponds to a subset of coefficients of the group theoretic 
Fourier Transform of /(•). We note here that a partial set 
of Fourier coefficients not only provides a rigorous way to 
compress the high-dimensional function /(•) (as used in 
0), but also have natural interpretations, which makes it easy 
to gather in practice. Under this setup, our goal is to char- 
acterize the functions / that can be recovered. The problem 
of exact recovery of functions from a partial information has 



been widely studied in the context of discrete-time functions; 
however, the existing approaches dont naturally extend to 
our setup. One of the classical approaches for recovery is to 
find the function with the minimum "energy" consistent with 
the given partial information. This approach was extended to 
functions over permutations in |7|, where the authors obtain 
lower bounds on the energy contained in subsets of Fourier 
Transform coefficients to obtain better £2 guarantees when 
using the function the minimum "energy." This approach, how- 
ever, does not naturally extend to the case of exact recovery. In 
another approach, which recently gained immense popularity, 
the function is assumed to have a sparse support and conditions 
are derived for the size of the support for which exact recovery 
is possible. This work was pioneered by Donoho; in fll, 
Donoho and Stark use generalized uncertainty principles to 
recover a discrete-time function with sparse support from a 
limited set of Fourier coefficients. Inspired by this, we restrict 
our attention to functions with a sparse support. 

Assuming that the function is sparse, our approach to 
performing exact recovery is to find the function with the 
sparsest support that is consistent with the given partial 
information, henceforth referred to as £0 optimization. This 
approach is often justified by the philosophy of Occam's 
razor. We derive sufficient conditions in terms of sparsity 
(support size) for functions that can be recovered through £0 
optimization. Furthermore, finding a function with the sparsest 
support through £0 minimization is in general computationally 
hard. This problem is typically overcome by considering the 
convex relaxation of the £0 optimization problem. However, as 
we show in Theorem IIII.2I such a convex relaxation does not 
yield exact recovery in our case. Thus, we propose a simple 
iterative algorithm called the 'sparsest-fit' algorithm and prove 
that the algorithm performs exact recovery of functions that 
satisfy the sufficient conditions. 

It is worth noting that our work has important connections 
to the work done in the recently popular area of compressive 
sensing. Broadly speaking, this work derives sufficient con- 
ditions under which the sparsest function that is consistent 
with the given information can be found by solving the 
corresponding £1 relaxation problem. However, as discussed 
below in the section on relevant work, the sufficient conditions 
derived in this work do not apply to our setting. Therefore, 
our work may be viewed as presenting an alternate set of 
conditions under which the £0 optimization problem can be 
solved efficiently. 

A. Related Work 

Fitting sparse models to observed data has been a classical 
approach used in statistics for model recovery and is inspired 
by the philosophy of Occam's Razor. Motivated by this, suf- 
ficient conditions based on sparsity for learnability have been 
of great interest over years in the context of communication, 
signal processing and statistics, cf. IHl, ||9|- In recent years, 
this approach has become of particular interest due to exciting 
developments and wide ranging applications including: 

. In signal processing (see Qol, iH, HI, HI, lfT4l ) 

where the goal is to estimate a 'signal' by means of 
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minimal number of measurements. This is referred to as 
compressive sensing. 

• In coding theory through the design of low-density parity 
check codes ifTSl . llT6l . ifTTl or in the design Reed 
Solomon codes fTS'l where the aim is to design a coding 
scheme with maximal communication rate. 

• In the context of streaming algorithms through the design 
of 'sketches' (see fl9|, |20|, f^Tl, f22l, f23l) for the 
purpose of maintaining a minimal 'memory state' for the 
streaming algorithm's operation. 

In all of the above work, the basic question (see f24l) 
pertains to the design of an m x rt "measurement" matrix 
A so that X can be recovered efficiently from measurements 
y = Ax (or its noisy version) using the "fewest" possible 
number measurements m. The setup of interest is when x is 
sparse and when ni < n oi m n. The type of interesting 
results (such as those cited above) pertain to characterization 
of the sparsity K of x that can be recovered for a given number 
of measurements m. The usual tension is between the ability to 
recover x with large k using a sensing matrix A with minimal 
m. 

The sparsest recovery approach of this paper is similar (in 
flavor) to the above stated work; in fact, as is shown subse- 
quently, the partial information we consider can be written as 
a linear transform of the function /(•). However, the methods 
or approaches of the prior work do not apply. Specifically, the 
work considers finding the sparsest function consistent with 
the given partial information by solving the corresponding 
£i relaxation problem. The work derives a necessary and 
sufficient condition, called the Restricted Nullspace Property, 
on the structure of the matrix A that guarantees that the 
solutions to the ™d ti relaxation problems are the same 
(see lilTI . El]). However, such sufficient conditions trivially 
fail in our setup (see ll25l ). Therefore, our work provides an 
alternate set of conditions that guarantee efficient recovery of 
the sparsest function. 

B. Our Contributions 

Recovery of a function over permutations from only partial 
information is clearly a hard problem both from a theoretical 
and computational standpoint. We make several contributions 
in this paper to advance our understanding of the problem in 
both these respects. As the main result, we obtain sufficient 
conditions - in terms of sparsity - for functions that can 
be recovered exactly from partial information. Specifically, 
our result establishes a relation between the "complexity" (as 
measured in sparsity) of the function that can be recovered 
and the "amount" of partial information available. 

Our recovery scheme consists of finding the sparsest so- 
lution consistent with the given partial information through 
^0 optimization. We derive sufficient conditions under which 
a function can be recovered through £o optimization. First, 
we state the sufficient conditions for recovery through £o 
optimization in terms of the structural properties of the func- 
tions. To understand the strength of the sufficient conditions, 
we propose a random generative model for functions with 
a given support size; we then obtain bounds on the size of 



the support for which a function generated according to the 
random generative model satisfies the sufficient conditions 
with a high probability. To our surprise, it is indeed possible to 
recover, with high probability, functions with seemingly large 
sparsity for given partial information (see precise statement of 
Theorems innilllLSl for details). 

Finding the sparsest solution through optimization is 
computationally hard. This problem is typically overcome by 
considering the ti convex relaxation of the £o optimization 
problem. However, as we show in Example III-C.ll £i relax- 
ation does not always result in exact recovery, even when the 
the sparsity of the underlying function is only 4. In fact, a 
necessary and sufficient condition for relaxation to yield 
the sparsest solution x that satisfies the constraints y — Ax 
is the so called Restricted Nullspace Condition (RNC) on 
the measurement matrix A; interestingly, the more popular 
Restricted Isoperimetric Property (RIP) on the measurement 
matrix A is a sufficient condition. However, as shown below, 
the types of partial information we consider can be written as a 
linear transform of /(•). Therefore, Example III-C . II shows that 
in our setting, the measurement matrix does not satisfy RNC. 
It is natural to wonder if Example III-C.ll is anomalous. We 
show that this is indeed not the case. Specifically, we show 
in Theorem IIII.2I that, with a high probability, ii relaxation 
fails to recover a function generated according to the random 
generative model. 

Since convex relaxations fail in recovery, we exploit the 
structural property of permutations to design a simple iter- 
ative algorithm called the 'sparsest-fit' algorithm to perform 
recovery. We prove that the algorithm recovers a function from 
a partial set of its Fourier coefficients as long as the function 
satisfies the sufficient conditions. 

We also study the limitation of any recovery algorithm 
to recover a function exactly from a given form of partial 
information. Through an application of classical information 
theoretic Fano's inequality, we obtain a bound on the sparsity 
beyond which recovery is not asymptotically reliable; a recov- 
ery scheme is called asymptotically reliable if the probability 
of error asymptotically goes to 0. 

In summary, we obtain an intuitive characterization of the 
"complexity" (as measured in sparsity) of the functions that 
can be recovered from the given partial information. We show 
how £i relaxation fails in recovery in this setting. Hence, the 
sufficient conditions we derive correspond to an alternate set 
of conditions that guarantee efficient recovery of the sparsest 
function. 

C. Organization 

Section |ll] introduces the model, useful notations and the 
precise formulation of the problem. In Section Hill we provide 
the statements of our results. SectionlTVldescribes our iterative 
algorithm that can recover / from /(A) when certain condi- 
tions (see Condition [Til are satisfied. Sections IV] to IXII provide 
detailed proofs. Conclusions are presented Section IxIH 

II. Problem Statement 

In this section, we introduce the necessary notations, defi- 
nitions and provide the formal problem statement. 
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A. Notations 

Let n be the number of elements and Sn be set of all pos- 
sible nl permutations or rankings of these of n elements. Our 
interest is in learning non-negative valued functions / defined 
on Sn, i.e. f : Sn ^ R+, where M+ = {a; e M : x > 0}. The 
support of / is defined as 

supp(/) = {ae5„: /(a)^0}. 

The cardinality of support, | supp (/) | will be called the 
sparsity of / and will be denoted by K. We will also call 
it the norm of /, denoted by |/|o. 

In this paper, we wish to learn / from a partial set of 
Fourier coefficients. To define the Fourier transform of a 
function over the permutation group, we need some notations. 
To this end, consider a partition of n, i.e. an ordered tuple 
A = (Ai, A2, . . . , Ar), such that Ai > A2 > • • • > Ar > 1, 
and n = Ai + A2 + . . . + Ar. For example, A (n — 1, 1) is 
a partition of n. Now consider a partition of the n elements, 
{!,..., n}, as per the A partition, i.e. divide n elements into 
r bins with ith bin having A^ elements. It is easy to see that n 
elements can be divided as per the A partition in D\ distinct 
ways, with 

Let the distinct partitions be denoted by , 1 < i < For 
example, for A = (n — 1, 1) there are D\ = n\/{n — 1)! = n 
distinct ways given by 



Given a permutation a E Sn, its action on ti is defined through 
its action on the n elements of ti, resulting in a A partition 
with the n elements permuted. In the above example with 
A = (n — 1, 1), (T acts on ti to give i.e. 

a : ti ~> t^i^i) , where ti = {1, . . . , i — 1, i + 1, . . . , and 

tail) = {1, • • ■ , - 1, c^(0 + 1, ■ • • , n}{a{i)}. 

Now, for a given partition A and a permutation cr e Sn, define 
a 0/1 valued Dx x Dx matrix AI^{a) as 



, , , 1, if cr(ti) = t, 
0, otherwise. 



for all l<i,j <Dx 



This matrix AI^ (cr) corresponds to a degree Dx representation 
of the permutation group. 

B. Partial Information as a Fourier Coefficient 

The partial information we consider in this paper is the 
Fourier transform coefficient of / at the representation M^, 
for each A. The motivation for considering Fourier coefficients 
at representations is two fold: first, they provide a rigorous 
way to compress the high-dimensional function /(•) (as used 
in H, Q), and second, as we shall see, Fourier coefficients at 
representations have natural interpretations, which makes 
it easy to gather in practice. In addition, each representation 

'To keep notation simple, we use ti instead of that takes explicit 
dependence on A into account. 



contains a subset of the lower-order irreducible repre- 
sentations; thus, for each A, conveniently captures the 
information contained in a subset of the lower-order Fourier 
coefficients up to A. We now define the Fourier coefficient of / 
at the representation M^, which we call A-partial information. 

Definition II.l (A-Partial Information). Given a function 
/ : 5„ — > R+ and partition A. The Fourier Transform co- 
efficient at representation M^, which we call the X-partial 
information, is denoted by /(A) and is defined as 



/(A) = ^ f{a)M\a) 



aeSr, 

Recall the example of A = (n — 1, 1) with / as a probability 
distribution on 5„. Then, /(A) is an n x n matrix with the 
(i, j)th entry being the probability of element j mapped to 
element i under /. That is, /(A) corresponds to first order 
marginal of / in this case. 

C. Problem Formulation 

We wish to recover a function / based on its partial 
information /(A) based on partition A. As noted earlier, the 
classical approach based on Occam's razor suggests recovering 
the function as a solution of the following £0 optimization 
problem: 



minimize 
subject to 



II5II0 over 
.9(A) - /(A). 



g ■■ Sn 



(1) 



We note that the question of recovering / from /(A) is very 
similar to the question studied in the context of compressed 
sensing, i.e. recover x from y — Ax. To see this, with an 
abuse of notation imagine /(A) as the Dj^ dimensional vector 
and / as nl dimensional vector. Then, /(A) = Af where each 
column of A corresponds to (cr) for certain permutation cr. 
The key difference from the compressed sensing literature is 
that A is given in our setup rather than being a design choice. 

Question One. As the first question of interest, we wish 
to identify precise conditions under which £0 optimization 
problem ([T]l recovers the original function / as its unique 
solution. 

Unlike the popular literature (cf. compressed sensing), such 
conditions can not be based on sparsity only. This is well 
explained by the following (counter-)example. In addition, the 
example also shows that Unear independence of the support 
of / does not guarantee uniqueness of the solution to the £0 
optimization problem. 

Example II-C.l. For any n > 4, consider the four permuta- 
tions <Ti = (1,2), cr2 = (3,4), cr3 = (1,2)(3,4) and = id, 
where id is the identity permutation. In addition, consider the 
partition A = (n — 1, 1). Then, it is easy to see that 

We now consider three cases where a bound on sparsity is 
not sufficient to guarantee the existence of a unique solution 

to dB- 
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1) This example shows that a sparsity bound (even A) on 
f is not sufficient to guarantee that f will indeed be the 
sparsest solution. Specifically, suppose that f{ai) = pi, 
where pi € R-j- for 1 < i < 4, and f{(j) — for all 
other a € 5„. 'Without loss of generality, let p\ < p2- 
Then, 



/(A) 

+ {pi+Pi)M^{ai). 



-p3M^{a3)+p4M^{a4) 
-pi)M^{a3) 



Thus, function g with g{(72) = P2 ^Pi, ^(o's) — Ps+Pi. 
5(174) = Pi + Pi and g{(j) = for all other a G Sn 
is such that g{X) = /(A) but \\g\\o = 3 < 4 = ||/||o- 
That is, f can not be recovered as the solution of £0 
optimization problem ([T]l even when support of f is only 
4. 

2) This example shows that although f might be a sparsest 
solution, it may not be unique. In particular, suppose 
that /(ui) = /(o'2) = p and /(cr) = for all other a G 
Sn. Then, /(A) = pM^(ai) +pM\a2) = pM^ia^) + 
pM^{ai). Thus, ([T]) does not have a unique solution. 

3) Finally, this example shows that even though the sup- 
port of f corresponds to a linearly independent set of 
columns, the sparsest solution may not be unique. Now 
suppose that f{cfi) = Pi, where pi £ R+ for 1 < i < 3, 
and f{(j) = for all other a € Sn- Without loss of 
generality, let pi < P2- Then, 

/(A) 

:=^PiM^{ai) + p2M^{a2) + PsM^ias) 

= iP2-Pi)M^{<72) + ip3 + pi)M^{<73) + piM^iai). 

Here, note that {M^(cri), A/^(cr2), Af ^(0-3)} is linearly 
independent, yet the solution to ([TJ is not unique. 

Question Two. The resolution of the first question will 
provide a way to recover / by means of solving the 
optimization problem in However, in general, it is com- 
putationally a hard problem. Therefore, we wish to obtain a 
simple and possibly iterative algorithm to recover / (and hence 
for solving ([TJ). 

Question Three. Once we identify the conditions for exact 
recovery of /, the next natural question to ask is "how 
restrictive are the conditions we imposed on / for exact 
recovery?" In other words, as mentioned above, we know that 
the sufficient conditions don't translate to a simple sparsity 
bound on functions, however, can we find a sparsity bound 
such that "most," if not all, functions that satisfy the sparsity 
bound can be recovered? We make the notion of "most" 
functions precise by proposing a natural random generative 
model for functions with a given sparsity. Then, for given a 
partition A, we want to obtain i^(A) so that if K < K{X) then 
recovery of / generated according to the generative model 
from /(A) is possible with high probabiUty. 



This question is essentially an inquiry into whether the 
situation demonstrated by Example III-C.ll is contrived or not. 
In other words, it is an inquiry into whether such exam- 
ples happen with vanishingly low probability for a randomly 
chosen function. To this end, we describe a natural random 
function generation model. 

Definition II.2 (Random Model). Given K e Z+ and an 
interval 'rf = [a,b], < a < b, a random function f with 
sparsity K and values in ^£ is generated as follows: choose K 

Sermutations from Sn independently and uniformly at random 
say (Ti , . . . , ax! select K values from uniformly at 
random, say pi, . . . , pk; then function f is defined as 



/(^) = 



if CT = CTj 

otherwise. 



l<i< K 



We will denote this model as R{K, '^). 

Question Four Can we characterize a limitation on the 
ability of any algorithm to recover / from /(A) ? 

III. Main Results 

As the main result of this paper, we provide answers to the 
four questions stated in Section III-CI We start with recalling 
some notations. Let A = (Ai, . . . , A^) be the given partition of 
n. We wish to recover function f : Sn ^ R+ from available 
information /(A). Let the sparsity of / be K, 

supp(/) = {cTi, . . . ,ctk}, and f{<Tk)=Pk,l<k<K. 

Answers One & Two. To answer the first two questions, we 
need to find sufficiency conditions for recovering / through £q 
optimization ^ and a simple algorithm to recover the func- 
tion. For that, we first try to gain a qualitative understanding of 
the conditions that / must satisfy. Note that a necessary con- 
dition for £q optimization to recover / is that ([TJ must have a 
unique solution; otherwise, without any additional information, 
we wouldn't know which of the multiple solutions is the true 
solution. Clearly, since /(A) ^ E^eS„ fi(^)M^ia), ([D will 



Gsupp(/) 



is linearly 



have a unique solution only if {M'*'(tT)}^ 
independent. However, this linear independence condition is, 
in general, not sufficient to guarantee a unique solution; in 
particular, even if {M'^{(t)} ^^^^^^^^^.^ is linearly indepen- 
dent, there could exist {M^{a')}^,^^ such that /(A) = 
E^'ewAfV) and \H\ < K, where K := |supp(/)|; 
Example III-C.ll illustrates such a scenario. Thus, a sufficient 
condition for / to be the unique sparsest solution of ([TJ is 



that not only is {M^{a)} 
{M\a),MHa')} 



CTesupp(/) 



(TGsupp(/),(T'g'H 



linearly independent, but 
is linearly independent for 
all H C Sn such that {T-il < K; in other words, not only 
we want M^{a) for a G supp (/) to be linearly independent, 
but we want them to be linearly independent even after the 
addition of at most K permutations to the support of /. 
Note that this condition is similar to the Restricted Isometry 
Property (RIP) introduced in |10|, which roughly translates 

^Throughout, we will assume that the random selection is done with 
replacement. 
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to the property that £o optimization recovers x of sparsity 
K from y = Ax provided every subset of 2K columns of 
A is linearly independent. Motivated by this, we impose the 
following conditions on /. 

Condition 1 (Sufficiency Conditions). Let / satisfy the fol- 
lowing: 

o Unique Witness: for any a E supp (/), there exists 1 < 

ia,ja < Dx such that M^^^ (cr) = 1, but Af^^-^ (cr') = 0, 

for all a'{^ a) e supp (/) . 
o Linear Independence: for any collection of integers 

ci,...,CK taking values in {-K, K}, J2k=i '^kPk ^ 

0, unless ci = . . . = c/< = 0. 

The above discussion motivates the "unique witness" con- 
dition; indeed, M^{a) for a satisfying the "unique witness" 
condition are linearly independent because every permutation 
has a unique witness and no non-zero linear combination of 
M'^{a) can yield zero. On the other hand, as shown in the 
proof of Theorem IIII.ll the linear independence condition is 
required for the uniqueness of the sparsest solution. 

Now we state a formal result that establishes Condition [T] 
as sufficient for recovery of / as the unique solution of £o 
optimization problem. Further, it allows for a simple, iterative 
recovery algorithm. Thus, Theorem IIII. 1 1 provides answers to 
questions One and Two of Section Hl-CI 

Theorem III.l. Under Condition |7] the function f is the 
unique solution of the £q optimization problem ([T]i. Further, 
a simple, iterative algorithm called the sparsest-fit algorithm, 
described in Section |73 recovers f. 

Linear Programs Don't Work. Theorem IIII.ll states that 
under Condition [T] the £o optimization recovers / and the 
sparsest-fit algorithm is a simple iterative algorithm to recover 
it. In the context of compressive sensing literature (cf. IfTll . 
IIT3I . IIT4II, {211), it has been shown that convex relaxation 
of £0 optimization, such as the Linear Programing relaxation, 
have the same solution in similar scenarios. Therefore, it is 
natural to wonder whether such a relaxation would work in our 
case. To this end, consider the following Linear Programing 
relaxation of ([T]i stated as the following £1 minimization 
problem: 

minimize \\g\\i over g : Sn — > 
subject to .g(A) ^ /(A). (2) 

Example III-C. 1 1 provides a scenario where £1 relaxation fails 
in recovery. In fact, we can prove a stronger result. The 
following result establishes that - with a high probability - 
a function generated randomly as per Definition III. 21 cannot 
be recovered by solving the linear program (|2|i because there 
exists a function g such that g{X) = /(A) and H^Hi = || /Hi- 
Theorem III.2. Consider a function f randomly generated 
as per Definition 1/7.21 with sparsity K > 2. Then, as longs as 
A is not the partition (n times), with probability 

1 — 0(1), there exists a function g distinct from f such that 
g(A)=/(A) and\\g\\i^\\f\\i. 



Answer Three. Next, we turn to the third question. Specifi- 
cally, we study the conditions for high probability recoverabil- 
ity of a random function / in terms of its sparsity. That is, we 
wish to identify the high probability recoverability threshold 
K{X). In what follows, we spell out the result starting with 
few specific cases so as to better explain the dependency of 
if (A) on Dx. 

Case 7: A = (n — 1, 1). Here Dx = n and /(A) provides 
the first order marginal information. As stated next, for this 
case the achievable recoverabihty threshold K{X) scaled as 
n log n. 

Theorem III.3. A randomly generated f as per Definition \n.2\ 
can be recovered by the sparsest-fit algorithm with probability 
1 — 0(1) as long as K < (1 — e)nlogn for any fixed e > 0. 

Case 2: X = {n — m, m) with 1 < m — 0{1). Here Dx = 
0(?i™) and /(A) provides the mth order marginal information. 
As stated next, for this case we find that K{X) scales at least 
as n™ log n. 

Theorem III.4. A randomly generated f as per Definition \n.2\ 
can be recovered from /(A) by the sparsest-fit algorithm for 
X = {n~m, to), to — 0(1), with probability 1 — o(l) as long 
as K < ^"''^i^'* n™ log n /or any fixed e > 0. 

In general, for any A with Ai — n — m and m = 0(1), 
arguments of Theorem lIII.4l can be adapted to show that K{X) 
scales as n™ log n. Theorems lIII.3l and lIII.4l suggest that the re- 
coverability threshold scales Dx log 73a for A = (Ai, . . . , A, ) 
with Ai — n ~ m for ni — 0(1). Next, we consider the case 
of more general A. 

Case 3: X ~ (Ai, . . . , A^) with Ai ~ n — O (^n^^^^ for 
any S > 0. As stated next, for this case, the recoverability 
threshold K{X) scales at least as Dxlog\ogDx. 

Theorem III.5. A randomly generated f as per Definition \n.2\ 
can be recovered from /(A) by the sparsest-fit algorithm for 
X = (Ai, . . . , Ar) with Ai = n — no""^ for any 5 > Q, with 
probability 1 — o(l) as long as K < s)Dx log log Dx for 
any fixed e > 0. 

Case 4: Any A ~ (Ai,...,Ar). The results stated thus 
far suggest that the threshold is essentially Dx, ignoring the 
logarithm term. For general A, we establish a bound on K{X) 
as stated in Theorem IIII.6l below. Before stating the result, we 
introduce some notation. For given A, define a = (ai, . . . , a^) 
with ai = Xi/n, 1 <i <r. Let 

r r 

77(a) = —^ttilogai, and 77'(a) = — ^ loga^. 

i=l i=2 

Theorem III.6. Given X = (Ai, . . . , Xr), a randomly gener- 
ated f as per Definition 177.21 can be recovered from /(A) by 
the sparsest-fit algorithm with probability 1 — o(l) as long as 

K < CDl^''\ (3) 

^Throughout this paper, by log we mean the natural logarithm, i.e. logj,, 
unless otherwise stated. 
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where 



7(a) = 



M 



M+1 
with AI = 



l-C 
1 



,H{ a)-H'{a) 
H{a) 



1 — ai 

and < C, C" < oo are constants. 

At a first glance, the above result seems very different 
from the crisp formulas of Theorems IIII.3HIII.5] Therefore, 
let us consider a few special cases. First, observe that as 
ai ^ I, M / {M + 1) ^ 1. Further, as stated in Lemma HIlTI 
H'{a)/H{a) -> 1. Thus, we find that the bound on sparsity 
essentially scales as Dx. Note that the cases 1, 2 and 3 fall 
squarely under this scenario since ai = Xi/n = 1 — o(l). 
Thus, this general result contains the results of Theorems IIII.31 
nil. 51 (ignoring the logarithm terms). 

Next, consider the other extreme of ai I 0. Then, M — 1 
and again by Lemma llll.il H' [a) / H{a) — > 1. Therefore, the 
bound on sparsity scales as y/D\. This ought to be the case 
because for A = (1, . . . , 1) we have ai = 1/n — > 1, D\ = 
nl, and unique witness property holds only up to o{^/D\) = 
o{Vni) due to the standard Birthday paradox. 

In summary. Theorem IIII.6I appears reasonably tight for 
the general form of partial information A. We now state the 
Lemma llII.il used above (proof in Appendix lAl). 

Lemma III.l. Consider any a — (ai, . . . , ar) with 1 > ai > 
■ ■ ■ > ar > and X]I=i '^r = 1- Then, 

lim ^ = 1, 
Qiti H{a) 

Qi4.o H(a) 

Answer Four Finally, we wish to understand the funda- 
mental Umitation on the ability to recover / from /(A) by 
any algorithm. To obtain a meaningful bound (cf. Example 
Ill-C.lb . we shall examine this question under an appropriate 
information theoretic setup. 

To this end, as in random model R{K,^), consider a 
function / generated with given K and A. For technical 
reasons (or limitations), we will assume that the values p^s 
are chosen from a discrete set. Specifically, let each pi be 
chosen from integers {!,..., T} instead of compact set 
We will denote this random model by R{K,T). 

Consider any algorithm that attempts to recover / from 
/(A) under R{K, T). Let h be the estimation of the algorithm. 
Define probability of error of the algorithm as 

Pen=Pr(/l^/). 

We state the following result. 

Theorem III.7. With respect to random model R{K, T), the 
probability of error is uniformly bounded away from Q for all 
n large enough and any A, if 



K > 



_3Dl_ 

n log n 



log 



n log n 



V T 



(4) 



where for any two numbers x and y, xVy denotes max{a;, y}. 



IV. Sparsest-fit algorithm 

As mentioned above, finding the sparsest distribution that 
is consistent with the given partial information is in general a 
computationally hard problem. In this section, we propose an 
efficient algorithm to fit the sparsest distribution to the given 
partial information /(A), for any partition A of n. The algo- 
rithm we propose determines the sparsest distribution exactly 
as long as the underlying distribution belongs to the general 
family of distributions that satisfy the 'unique witness' and 
'linear independence' conditions; we call this the 'sparsest- 
fit' algorithm. In this case, it follows from Theorem IIII.ll 
that the 'sparsest-fit' algorithm indeed recovers the underlying 
distribution /(•) exactly from partial information /(A). When 
the conditions are not satisfied, the algorithm produces a 
certificate to that effect and aborts. 

Using the degree D\ representation of the permutations, the 
algorithm processes the elements of the partial information 
matrix /(A) sequentially and incrementally builds the permu- 
tations in the support. We describe the sparsest-fit algorithm 
as a general procedure to recover a set of non-negative values 
given sums of these values over a collection of subsets, which 
for brevity we call subset sums. In this sense, it can be thought 
of as a linear equation solver customized for a special class 
of systems of linear equations. 

Next we describe the algorithm in detail and prove the 
relavant theorems. 

A. Sparsest-fit algorithm 

We now describe the sparsest-fit algorithm that was also 
referred to in Theorems IIII.ll IIII.3IIIII.6l to recover function / 
from /(A) under Condition [T] 

Setup. The formal description of the algorithm is given in 
Fig- [11 The algorithm is described there as a generic procedure 
to recover a set of non-negative values given a collection of 
their subset sums. As explained in Fig. [T] the inputs to the al- 
gorithm are L positive numbers qi, . . . ,qL sorted in ascending 
order (?i < 92 < ■ ■ • < ^l- As stated in assumptions C1-C3 in 
Fig. [T] the algorithm assumes that the L numbers are different 
subset sums of K distinct positive numbers pi, . . . ,pk i-C-, 
— J2ti Pk for some C {1, 2, . . . , K}, and the values 
and subsets satisfy the conditions: for each 1 < k < K, 
Pk = qe for some 1 < ^ < L and J^rPk 7^ J2t' Pk for 
T 7^ T'. Given this setup, the sparsest-fit algorithm recovers 
the values pk and subset membership sets :— {t: k ^ Ti} 
for 1 < k < K using q^, but without any knowledge of K or 
subsets Ti,l<£< L. 

Before we describe the algorithm, note that in order to use 
the sparsest-fit algorithm to recover /(•) we give the non-zero 
elements of the partial information matrix /(A) as inputs q^. 
In this case, L equals the number of non-zero entries of /(A), 
Pk — /(cfc), and the sets Ak correspond to AI^{(Tk)- Here, 
assumption CI of the algorithm is trivially satisfied. As we 
argue in Section [V] assumptions C2, C3 are implied by the 
'unique witness' and 'linear independence' conditions. 

Description. The formal description is given below in 
the Fig. [T] The algorithm processes elements qi,q2, . . . ,qL 
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sequentially and builds membership sets incrementally. It 
maintains the number of non-empty membership sets at the 
end of each iteration I as k{i). Partial membership sets 
are maintained as sets Ak, which at the end of iteration t 
equals {1 < fc < fc(€) : k e Ti, for some £' < £}. The values 
found are maintained as pi,p2, ■ ■ ■ ,Pk{e)- The value of fc(0) 
is initialized to zero and the sets Ak are initialized to be empty. 

In each iteration £, the algorithm checks if the value qi can 
be written as a subset sum of values pi,p2, ■ ■ ■ ,Pk{e~i) for 
some subset T. If qi can be expressed as '^k^rpPk for some 
T C {1,2, . . . ,k{£ - 1)}, then the algorithm adds i to sets 
Ak for fc e T and updates k{£) as fc(^) = fc(^ - 1) before 
ending the iteration. In case there exists no such subset T, the 
algorithm updates k{£) as k{£ — 1) + 1, makes the set Ak(i) 
non-empty by adding £ to it, and sets Pk{e) to qi. At the end 
the algorithm outputs {pk,Ak) for 1 < fc < k{L). 



We now argue that under assumptions C1-C3 stated in 
Fig.[Tl the algorithm finds {pk,Ak) for 1 < k < K accurately. 
Note that by Assumption C2, there exists at least one qe such 
that it is equal to pk, for each 1 < k < K. Assumption C3 
guarantees that the condition in the if statement is not satisfied 
whenever qg = Pk(e)- Therefore, the algorithm correctly 
assigns values to each of the p^s. Note that the condition in 



g 

the if statement being true implies that qe is a subset sum 
of some subset T C {pi,P2i ■ ■ ■ ,Pk{e-i)}- Assumption C3 
ensures that if such a combination exists then it is unique. 
Thus, when the condition is satisfied, index £ belongs only 
to the sets Ak such that fc e T. When the condition in the 
if statement is false, then from Assumptions C2 and C3 it 
follows that £ is contained only in Ak(£)- From this discussion 
we conclude that the sparsest-fit algorithm correctly assigns all 
the indices to each of the AkS- Thus, the algorithm recovers 
Pk,Ak for 1 < k < K under Assumptions CI, C2 and C3. 
We summarize it in the following Lemma. 

Lemma IV.l. The sparsest-fit algorithm recovers pk,Ak for 
1 < k < K under Assumptions CI, C2 and C3. 

Complexity of the algorithm. Initially, we sort at most Z)^ 
elements. This has a complexity of 0{Dj^logD\). Further, 
note that the for loop in the algorithm iterates for at most Dj^ 
times. In each iteration, we are solving a subset-sum problem. 
Since there are at most K elements, the worst-case complexity 
of subset-sum in each iteration is 0{2^). Thus, the worst-case 
complexity of the algorithm is 0{Df log D\ + D\2^ ). How- 
ever, using the standard balls and bins argument, we can prove 
that for K = 0{D\\ogD\), with a high probability, there 
are at most 0(logI?A) elements in each subset-sum problem. 
Thus, the complexity would then be O (exp(log^ £'a)) with a 
high probability. 

V. Proof of Theorem IIII.1I 

The proof of Theorem IIII.ll requires us to establish two 
claims : under Condition [T] (i) the sparsest-fit algorithm finds 
/ and (ii) the £o optimization ([T]) has / as it's unique solution. 
We establish these two claims in that order. 

The sparsest-fit algorithm works. As noted in Section HV] 
the sparsest-fit algorithm can be used to recover / from 
/(A). As per Lemma flV. 1 1 the correctness of the sparsest- 
fit algorithm follows under Assumptions CI, C2 and C3. The 
Assumption CI is trivially satisfied in the context of recovering 
/ from /(A) as discussed in Section HVl Next, we show that 
Condition [U implies C2 and C3. Note that the unique witness 
of Condition [T] implies C2 while C3 is a direct implication 
of linear independence of Condition [T] Therefore, we have 
established that the sparsest-fit algorithm recovers / from / (A) 
under Condition [T] 

Unique Solution of £o Optimization. To arrive at a contra- 
diction, assume that there exists a function g: Sn ^ IR+ such 
that .g(A) = /(A) and L = < ||/||,„ - K. Let 

supp (/) = {(Tfc e 5„ : 1 < fc < K}, f{ak) =PkA<k<K, 

supp {g) ^ {pe e Sn ■■ I < £ < L}, g{pe) ^ qi,l < £ < L. 

By hypothesis of Theorem lIII.il / satisfies Condition[T] There- 
fore, entries of matrix /(A) contains the values pi,p2, ■ ■ ■ ,Pk- 
Also, by our assumption /(A) — g{\). Now, by definition, 
each entry of the matrix g{\) is a summation of a subset of 
L numbers, qi,l < £ < L. Therefore, it follows that for each 



Input: Positive values {qi,q2, . . . , (/l} sorted in ascending 
order i.e., f/i < <Z2 < • • ■ < qL- 

Assumptions: 3 positive values {pi,P2i ■ ■ ■ iPk} such that: 
CI. For each 1 < £ < L, qi = J2keTePk' for some Te C 
{1,2,. ..,K} 

C2. For each 1 < fc < i^, there exists a qe such that 

qe = Pk- 

C3. EkerPk + Ek'eT' Pk', for all T, T'C{1,2,..., J} 
and T n T' = 0. 

Output: {pi,p2, . . . ,pk}, y I < k < K set Ak s.t. 
Ak = {£ : qe ^ ^^Pj and index fc belongs to set T}. 

Algorithm: 

initialization: pq = 0, fc(0) = 0, Ak = for all possible fc. 
for £=1 to L 

if qi^J2keTPk for some T C {0, 1, . . . , fc(£ - 1)} 
k{£) = k{£-l) 
Ak^Akyj{£} V fceT 
else 

k{£) = fc(^- 1) + 1 

Pk(i) = 

Ak(e) = Ak(t) U {£} 
end if 
end for 

Output K = k{L) and {pk,Ak), 1 < k < K. 

Fig. 1. Sparsest- fit algorithm 
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k, \ < k < K, we have 

Pk=^qj, for some Tfc C {1, 2, . . . , i} . 

Equivalently, 

P = M, (5) 

where p = [pk\i<k<K, q = [qe]i<e<L A e {0, 1}^^-^. 

Now consider the matrix /(A). As noted before, each of its 
entries is a summation of a subset of numbers Pfe, 1 < k < K. 
Further, each Pk,^ < k < K contributes to exactly D\ distinct 
entries of /(A). Therefore, it follows that the summation of 
all entries of /(A) is D\{pi + • • • + pk)- That is, 

ij \k=l ) 

Similarly, 

■I J \e=i / 

But /(A) = .9 (A). Therefore, 

p-1 = q-1, (6) 

where 1 is vector of all Is of appropriate dimension (we have 
abused the notation 1 here): in LHS, it is of dimension K, in 
RHS it is of dimension L. Also, from ^ we have 

p-1 = Aq-1 

L 

= ^ciqe, (7) 
for some cj E Z+. From (|6]l and O, it follows that 
E^J = E^:'*- 

j 3 

Now, there are two options: (1) either all the qs are > 0, or (2) 
some of them are equal to zero. In the case (1), when q > 
for all 1 < £ < L, it follows that ci = 1 for each 1 < ^ < L; or 
else, RHS of dHJ will be strictly larger than LHS since qi > 
for all 1 < ^ < L by definition. Therefore, the matrix ^ in Q 
must contain exactly one non-zero entry, i.e. 1, in each column. 
Since pk > for all 1 < k < K, it follows that there must 
be at least K non-zero entries in A. Finally, since L < K, it 
follows that we must have L — K.ln summary, it must be that 
Ais a KxK matrix with each row and column having exactly 
one 1, and rest of the entries 0. That is, A is a permutation 
matrix. That is, pfc, 1 < fc < i^T is permutation of qi, . . . ,qL 
with L — K. By relabeling the q^s, if required, without loss 
of generality, we assume that pk = qk, for 1 < k < K. 
Since g(A) = /(A) and pk = qu for 1 < k < K, \t follows 
that g also satisfies Condition [1] Therefore, the sparsest-fit 
algorithm accurately recovers g from 5(A). Since the input to 
the algorithm is only g{X) and g(A) — /(A), it follows that 
g ^ f and we have reached contradiction to our assumption 
that / is not the unique solution of optimization problem ([T]i- 
Now consider the remaining case (2) and suppose that ce = 
for some £. Then, it follows that some of the columns in 



the A matrix are zeros. Removing those columns of A we can 
write 

P = Aq, 

where A is formed from A by removing the zero columns and 
q is formed from q by removing qis such that C£ = 0. Let L 
be the size of q. Since at least one column was removed, 
L < L < K. The condition L < K implies that the 
vector p lies in a lower dimensional space. Further, A is a 

0. 1 valued matrix. Therefore, it follows that p violates the 
linear independence property of Condition [T] resulting in a 
contradiction. This completes the proof of Theorem IIII.ll 

VI. Proof of Theorem HIO] 

We prove this theorem by showing that when two per- 
mutations, say (Ji,a2, are chosen uniformly at random, with 
a high probability, the sum of their representation matrices 
M^{ai) + M^{(T2) can be decomposed in at least two ways. 
For that, note that a permutation can be represented using 
cycle notation, e.g. for n — A, the permutation 1 H> 2, 2 i-> 

1, 3 1—^ 4, 4 1—^ 3 can be represented as a composition of two 
cycles (12) (34). We call two cycles distinct if they have no 
elements in common, e.g. the cycles (12) and (34) are distinct. 
Given two permutations ai and a2, let ci 2 = <Ji<J2 be their 
composition. 

Now consider two permutations ai and (J2 such that they 
have distinct cycles. For example, ai = (1, 2) and (J2 — (3, 4) 
are permutations with distinct cycles. Then (T1.2 = o'icr2 — 
(12)(34). We first prove the theorem for A = (n — 1, 1) and 
then extend it to a general A; thus, we fix the partition A = 
[n — 1, 1). Then, we have: 

M^(cri) + M^{a2) = M^(ai,2) + M^(id) (9) 

where ai and a2 have distinct cycles and id is the identity 
permutation. Now, assuming that pi < p2, consider the 
following: 

piM^{ai)+p2M^{a2) 
= piM^{ai^2) +PiM^{id) + ip2-pi)M^{a2). 

Thus, given /(A) = piAf^((Ti) + p2M^{cr2), it can be 
decomposed in two distinct ways with both having the same 
£1 norm. Of course, the same analysis can be carried out when 
/ has a sparsity K. Thus, we conclude that whenever / has 
two permutations with distinct cycles in its support, the £1 
minimization solution is not unique. Therefore, to establish 
claim of Theorem IIII.2I it is sufficient to prove that when 
we choose two permutations uniformly at random, they have 
distinct cycles with a high probability. 

To this end, let S' denote the event that two permutations 
chosen uniformly at random have distinct cycles. Since per- 
mutations are chosen uniformly at random, Pr {(a) can be 
computed by fixing one of the permutations to be id. Then, 
Pr (ff ) is the probability that a permutation chosen at random 
has more than one cycle. 

Let us evaluate Pi{S''^). For that, consider a permutation 
having exactly one cycle with the cycle containing / elements. 
The number of such permutations wiU be ('/)(' ^ !)!• This is 
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because we can choose the I elements that form the cycle in 
(") ways and the I numbers can be arranged in the cycle in 
{I — 1)! ways. Therefore, 

Now, without loss of generality let's assume that n is even. 
Then, 

«/2 ^ «/2 ^ ^ 

E 1(^^37)! ^ E (|)T = (|TT)T (11) 
The other half of the sum becomes 



i^2 ^("~ 0' ~ ^ 

Putting everything together, we have 



1 2^1 0(1) 



< -Y -< 

„ fc! n ^-^ k\ n 

k=0 2 k=0 



(12) 



Pr(f?) > 1 - Pr((f > 1 - 



1 as n 




Thus, Theorem nil. 21 is true for A = {n — 1, 1). 

In order to extend the proof to a general A, we observe that 
the standard cycle notation for a permutation we discussed 
above can be extended to A partitions for a general A. 
Specifically, for any given A, observe that a permutation can 
be imagined as a perfect matching in a D\ x D\ bipartite 
graph, which we call the A-bipartite graph and denote it by 
= (Vi X V2,E^); here Vi and V2 respectively denote 
the left and right vertex sets with \Vi \ = \V2\ = D\ with 
a node for every A partition of n. Let ti , ^2 , ■ • ■ , ^_Da denote 
the D\ A-partitions of n; then, the nodes in and V2 can 
be labeled by ti,t2, ■ ■ ■ , tn^ . Since every perfect matching in 
a bipartite graph can be decomposed into its corresponding 
distinct cycles (the cycles can be obtained by superposing the 
bipartite graph corresponding to identity permutation with the 
A-bipartite graph of the permutation), every permutation can 
be written as a combination of distinct cycles in its A-bipartite 
graph. The special case of this for A ~ 1) is the standard 

cycle notation we discussed above; for brevity, we call the A- 
bipartite graph for A = (n — 1, 1) the standard bipartite graph. 

In order to prove the theorem for a general A, using an 
argument similar to above, it can be shown that it is sufficient 
to prove that a randomly chosen permutation contains at 
least two distinct cycles in its A-bipartite graph with a high 
probability. For that, it is sufficient to prove that a permutation 
with at least two distinct cycles in its standard bipartite graph 
has at least two distinct cycles in its A-bipartite graph for 
any general A. The theorem then follows from the result we 
established above that a randomly chosen permutation has at 
least two distinct cycles in its standard bipartite graph with a 
high probability. 

To that end, consider a permutation, cr, with at least two 
distinct cycles in the standard bipartite graph. Let A :— 
(oi, 02, . . . , a^i) and B := (fei, &2, • ■ • , ^^2) denote the first 
two cycles in the standard bipartite graph; clearly, ^1^2 > 2 
and at least one of ^1,^2 is < n/2. Without loss of generality 



we assume that £2 < n/2. Let A = (Ai, A2, . . . , A^). Since 
•^1 > A2 > . . . > Ar, we have A^ < n/2. First, we consider 
the case when A^ < n/2. Now consider the A-partition, ti, 
of n constructed as follows: ai placed in the rth partition, 
02 in the first partition, all the elements of the second cycle 
&i, 62, ■ • ■ , ^^2 arbitrarily in the first r — 1 partitions and the 
rest placed arbitrarily. Note that such a construction is possible 
by the assumption on A^. Let t'^ denote (j{ti); then, t'^ 7^ ti 
because ti does not contain 02 in the rth partition while t'^ 
contains a{ai) = 02 in the rth partition. Thus, the partition 
ti belongs to a cycle that has a length of at least 2 partitions. 
Thus, we have found one cycle, which we denote by Ci. Now 
consider a second partition ^2 constructed as follows: 61 placed 
in the rth partition, &2 in the first and the rest placed arbitrarily. 
Again, note that a{t2) 7^ ^2- Thus, t2 belongs to a cycle of 
length at least 2, which we denote by C2. Now we have found 
two cycles Ci, C2, and we are left with proving that they are 
distinct. In order to establish the cycles are distinct, note that 
none of the partitions in cycle Ci can be t2. This is true 
because, by construction, <2 contains hi in the rth partition 
while none of the partitions in Ci can contain any elements 
from the cycle B in the rth partition. This finishes the proof 
for all A such that A^ < ri/2. 

We now consider the case when A^ = n/2. Since Ai > 
\r, it follows that r = 2 and A = {n/2, n/2). For ^2 < 
n/2, it is still feasible to construct ti and t2, and the theorem 
follows from the arguments above. Now we consider the case 
when ^ I2 ^ n/2; let £ := £1 = £2- Note that now it 
is infeasible to construct ti as described above. Therefore, 
we consider ti — {ai, &2, • ■ • , b^} {foi, 02, . . . , ai/\ and t2 = 
{61, a2, .■.,ai} {ai, 62, ■ • ■ , bi>}. Clearly, ti ^ t2, cr(ii) ^ h 
and a{t2) ^ ^2- Thus, ti and ^2 belong to two cycles, Ci 
and C2, each with length at least 2. It is easy to see that 
these cycles are also distinct because every A— partition in the 
cycle Ci will have only one element from cycle A in the first 
partition and, hence, Ci cannot contain the A— partition t2. 
This completes the proof of the theorem. 



VII. Proof of Theorem HIO : A = (71 - 1, 1) 

Our interest is in recovering a random function / from 
partial information /(A). To this end, let 

K = ll/llo, supp (/) - {(Tfe e 5„ : 1 < fc < K), 
and /((Tfe) ^ PkA < k < K. 

Here ak and pk are randomly chosen as per the random model 
R{K, '^) described in Section|II] For A = (n- 1, 1), Dx = n; 
then /(A) is an n x n matrix with its («,j)th entry being 



E ^fc' 

k-<yk (j)=i 



for 



1 < i, j < n. 



To establish Theorem IIII.3I we prove that as long as K < 
Cin log n with Ci = 1 — e, / can be recovered by the sparsest- 
fit algorithm with probability 1 — o(l) for any fixed e > 0. 
Specifically, we show that for K < Cinlogn, Condition [T] 
is satisfied with probability 1 — o(l), which in turn implies 
that the sparsest-fit algorithm recovers / as per Theorem IIII. II 
Note that the "linear independence" property of Condition [T] 
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is satisfied with probability 1 under R{K, as are chosen 
from a distribution with continuous support. Therefore, we are 
left with establishing "unique witness" property. 

To this end, let 4(5 = £ so that Ci < 1 - 4(5. Let Sy, be 
the event that CTfe satisfies the unique witness property, 1 < 
k < K. Under R{K, '^), since K permutations are chosen 
from Sn independently and uniformly at random, it follows 
that Pr(fSfe) is the same for all k. Therefore, by union bound, 
it is sufficient to establish that K'Pt:{Si) — o(l). Since we 
are interested in K = 0{n\ogn), it is sufficient to establish 
Pr((#'f ) — 0(l/n^). Finally, once again due the symmetry, it is 
sufficient to evaluate Pr((?i) assuming ai = id, i.e. (Ti{i) — i 
for all 1 < i < n. Define 

■^j = Wk{j) ^ j, for 2<k< K}, for 1 < j < n. 
It then follows that 

Pr(A) = Pr K^i^,) . 
Therefore, for any L < n, we have 

PrK^) = Pr(n;Li=^;) 



npr(^; 



v-1 



(13) 



Next we show that for the selection of L = n^^^, the RHS 
of (fTSl l is bounded above by Gxp(— n*) = 0{l/n'^). That will 
complete the proof of achievability. 

For that, we start by bounding Pr(^{^): 



Pr(^f; 



Pr(^i) 



1 



K-l 



(14) 



The last equality follows because all permutations are cho- 
sen uniformly at random. For j > 2, we now evaluate 

for any fc, 2 < fc < K, 



Pr ( 



i-i 
e=i 



Given ^^^^^l. 



<7k{j) will take a value from n—j+l values, possibly including 
j, uniformly at random. Thus, we obtain the following bound: 

K-l 



Pr(^/ 



n 



< 1 



1 



1 



From (fT3])-(fT5]). we obtain that 



Pr((?^) 



< 



n- j + 1 



1 



(.15) 



< 



< 



1 



1 



1 



n — L 
1 

n — L 



K 



Cin log n 



(16) 



where we have used K < Cinlogn in the last inequality. 
Since L = n^~^, n — L = n{l — o(l)). Using the standard 
fact 1 — a; = e^^(l + 0{x'^)) for small x e [0, 1), we have 



1 



1 



n — L 



= cxp 



1 



n — L 



1 



1(17) 



Finally, observe that 
1 + 



Cin log n 



6(1). 



Therefore, from ( fTSI l and (fTTI ). it follows that 

Ci log n 



Pr(^i^) 



< 
< 



1 - e exp 



1 



[l-e(exp(-(Ci+5) \ogn))Y 

1 - e ' 



< exp 



exp I 



-e 



L 



Ci+5 



(18) 



where we have used the fact that 1 — x < e ^ for x G [0, 1] 
and L = n^-^ Ci < 1 - 4(5. From ([181), it follows that 
Pr((Si) = 0(l/n^). This completes the proof of achievability. 



VIII. Proof of Theorem IIII.4I : A = (n - m, to) 

Our interest is in recovering the random function / from 
partial information /(A). As in proof of Theorem IIII.3I we 
use the notation 

K = ll/llo, supp (/) = {(Tfe e 5„ : 1 < fc < K}, 
and /(cTfc) = PkA < k < K. 

Here ak and pk are randomly chosen as per the random model 
R{K, '£) described in Section For A = (n — m, m), D\ = 
7 — — r ~ n™- and f(X) is an D\ x D\ matrix. 

(n— m)!m! •> y / ^ 

To establish Theorem IIII.4I we shall prove that as long as 
K < Cin™ log n with < Ci < a constant, / can be 
recovered by the sparsest-fit algorithm with probability 1 — 
o(l). We shall do so by verifying that the Condition [T] holds 
with probability 1 — o(l), so that the sparsest-fit algorithm will 
recover / as per Theorem IIII.ll As noted earlier, the "linear 
independence" of Condition [T] is satisfied with probability 1 
under R{K,^(o). Therefore, we are left with establishing the 
"unique witness" property. 

To this end, for the purpose of bounding, without loss of 
generality, let us assume that K ^ ii^n™logn for some 
(5 > 0. Set L ~ ■n}^'^ . Following arguments similar to those in 
the proof of Theorem llll.SI it will be sufficient to establish that 
Pr((9'f ~ 0(1/?!^™); where is the event that permutation 
(7i = id satisfies the unique witness property. 

To this end, recall that /(A) is a D\ x D\ matrix. Each 
row (and column) of this matrix corresponds to a distinct A 
partition of n : t;, 1 < « < Dx. Without loss of generality, let 
us order the Dx A partitions of n so that the ith partition, ti, 
is defined as follows: ti = {1, . . . , n — TO}{n — to + 1, ...,«}, 
and for 2 < i < L, 

ti = {1, . . . , n — im, n ^ [i — 1)to + 1, . . . , n} 

{n — im + 1, . . . , n — (i — 1)to}. 

Note that since cti = id, we have ai{ti) — ti for all 1 < i < 
Dx. Define 

= {(jfe(ij) ^ tj, for 2<k<K}, for 1 < j < Da- 
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Then it follows that 



Pr(A) = Pr (ufii 



Therefore, 

Pr(^- 



Pr (^f ; 



J=2 



r JF 



(19) 



First, we bound Pr(jFf ). Each permutation ct^., fc 7^ 1, maps 
ii = {1, . . . , — m}{n — m + 1, . . . , n} to {(Ta;(1), . . . (7fe(n — 
m)}{crfe(r7, — m + 1), . . . ,(Tfc(n)}. Therefore, (Tfe(ti) = t\ iff 
(Tfc maps set of elements {n — m + 1, . . . , n} to the same set 
of elements. Therefore, 



PrK(ii) 



1 



< 



(n — LmY 



(20) 



Therefore, it follows that 



Pr(^f) = l-Pr(^i) 

- l-PrK(ti) T^ti, 2<k<K) 



K 



= l-[](l-PrK(ti)=ti)) 



fc=2 



(n — LmY 



K 



(21) 



Next we evaluate Pr ( 



n 



for 2 < j < L. 



Given n^^|=^/, we have (at least partial) information about 
the action of crfc,2 < k < K over elements {n — (j — 
l)?7i + l,...,n}. Conditional on this, we are interested in 
the action of ak on tj, i.e. {n — jm + 1, . . . , n — jm + m}. 
Specifically, we want to (upper) bound the probability that 
these elements are mapped to themselves. Given DgZl^^, 
each (Tfc will map {n — jm + 1, . . . ,n — jm + m} to one of 
the ("~*'"'^^^'") possibilities with equal probability. Further, 
{n — jm + 1, . . . , n — jm + m} is not a possibility. Therefore, 
for the purpose of upper bound, we obtain that 



Pr(^; 



K-1 



< 1-1 



K 



{n — Lmy 



From ([T9l)-(|22l), we obtain that 

Pr(^0 < 1 - ( 1 - 



{n — LmY 



K 



.(22) 



(23) 



Now Lm = o{n) and hence n — Lm = n(l — o(l)). Using 
1 — X — e^^(l + 0(2:^)) for small x e [0, 1), we have 

1 



exp 



(n — LmY 
m,\ 



l + O 



1 



(n — Lm)'"- J \ \n 
Finally, observe that since K = 0{n'^ logn) 



2rn 



(24) 



1 + 



1 



= 9(1). 



Thus, from 

Pr(^i^) < 



and dllli, it follows that 



1 - 6 exp 



Km\ 



1 - 6 exp 



— Lm/nY' 
(1 - 25) logn 



' " " ' ' (1 - n-^m)"^ 
< [l-e(exp(-(l-3V2)logn))]^ 

L 

1-e ' 



< 



„l-3(5/2 

< exp 

< exp f-f7(n''/2 



)) 



O 



1 

T2m 



(25) 



In above, we have used the fact that 1 — x < e ^ for x G [0,1] 
and choice of L = n^^^ . This completes the proof of Theorem 

mil 

IX. Proof of Theorem IIII.51 Ai = n - n^^^,5 > 

So far we have obtained the sharp result that algorithm 
the sparsest-fit algorithm recovers / up to sparsity essentially 
logn for A with Ai = n — m where m = 0(1). Now we 
investigate this further when to scales with n, i.e. m = uj{1). 
Let Ai ^ n — fi with /x < na^* for some S > 0. For such 

„ n! 



< 



Ai! 



(26) 



Our interest is in the case when K < {1 — £)D\ log log D\ for 
any e > 0. For this, the structure of arguments will be similar 
to those used in Theorems IIII.3I and IIII.4I Specifically, it will 
be sufficient to establish that Pr((ff ) = 0{l/D\), where Si is 
the event that permutation ai = id satisfies the unique witness 
property. 

To this end, we order the rows (and corresponding columns) 
of the D\ X Dx matrix /(A) in a specific manner Specifically, 
we are interested in the L = Shb log'^ n rows that we 
call te,l < £ < L and they are as follows: the first row, ti 
corresponds to a partition where elements {1, . . . , Ai} belong 
to the first partition and {Ai + 1, . . . , n} are partitioned into 
remaining r — 1 parts of size A2, . . . , Ar in that order. The 
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partition t2 corresponds to the one in which the first part 
contains the Ai elements {1, . . . , n — 2fi, n — /i + 1, . . . , n}, 
while the other ?■ — 1 parts contain {n — 2fi + 1, . . . , n — ^} 
in that order More generally, for 3 < £ < L, te contains 
{1, . . . ,n — n — {£ ~ + 1, . . . ,n} in the first partition 
and remaining elements {n — £^ + 1, . . . ,n — {£ — l)fj,} in 
the rest of the r — 1 parts in that order By our choice of L, 
Lfj. = o{n) and, hence, the above is well defined. Next, we 
bound Pr{S'i) using these L rows. 

Now di — id and hence cri(ii) = ti for all 1 < i < D\. 
Define 

= {(jfc(tj) ^ tj, for 2<k<K}, for 1 < j < Dx- 
Then it follows that 



Pr(A) = Pr (Ufii^ 



Therefore, 



Pr 



Pr (jTf) 



J=2 



.(27) 



First, we bound Pr(^f ). Each permutation o-fc, 1 < k < K 
maps ti to one of the D\ possible other A partitions with 
equal probability. Therefore, it follows that 



1 



(28) 



Thus, 



Pr(^- 



= 1-Pr(j?i) 

= l-PrK(fi) ^ti, 2<k<K) 



K 



1-\{{1-Vv{ak{h)=ti)) 



k=2 



Next we evaluate Pr 



1 



.1 



K 



(29) 



for 2 < j < L. Given 



Ci^Zi^i, we have (at least partial) information about the ac- 
tion of (Tfc, 2 < k < K over elements {n—{j — l)fi+l, . . . , n}. 
Conditional on this, we are interested in the action of <Jk on 
tj . Given the partial information, each of the ak will map tj to 
one of at least £'A(j) different options with equal probability 
for X{j) = (Ai — {j — A2, . . . , Xr) - this is because the 
elements 1, . . . , Ai — (j — in the first part and all elements 
in the remaining r — 1 parts are mapped completely randomly 
conditional on C\^(ZX^t- Therefore, it follows that 



From (|27]i-([30]l, we obtain that 



pr(<?f) < n 



1-1- 



D 



K 



< 



1-1- 



D 



In above we have used the fact that 



A(L) 



> D 



K 



(31) 



A(L) 



Consider 



D 



Dx = Dx(i) > 

A(j) ^ jn- {j - 1)a^)! (Ai - j»! 



D 



n 



{n-ij-l)ti-i) 
{\,-{j-l)fi-e) 



i,) n 



e=o 



1 - 



(j-i)m-<? 

Ai 



(32) 



Therefore, it follows that 



D 



A(l) 



D 



A(L) 



n \ -i-r 1 



n ^ 

^=0 Ai 



(33) 



-2x 



for 



Using 1 + a; < for any x G (—1, 1), 1 — x > e 
X e (0, 1/2) and = o(n), we have that for any l,^<£< 



i_£ 1_£-|-_L_ <^ 

n n Ai nAi 



1-t 



1 - 



< 



exp 



£^ - in 2e 

— H 

nAi A^ 

^/i 2£2 

nAi A^ 



< exp ( — + ) . 



(34) 



Therefore, we obtain 



D 



A(L) 



Ai 



nAi \\ 



Now 



< exp 



(36) 



It can be checked that for given choice of i, /i, we have 
L^2 ^ p^;^^)^ 2.3^3 ^ o(_)^2^ ^2^3 ^ o(„Ai). Therefore, 

in summary we have that 



Pr 



< 



1-1- 



D 



A(j) 



(30) 



^A(l) 



D 



A(L) 



1 + 0(1). 



(37) 
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Using similar approximations to evaluate the bound on RHS 
of dSTI i along with (|26b yields, 



Pr((f]^) < exp -L exp 



K 



= exp(-Lexp(-(l - e) loglogi?A(l + o(l)))) 
< exp(-Lcxp(-loglog£)A)) 
/ L 

= exp 



exp 



(38) 



log -Da / 

< exp (-2 log Da) 
1 

This completes the proof of Theorem IIII.5I 

X. Proof of Theorem IIII.6I General A 

We shall establish the bound on sparsity up to which 
recovery of / is possible from /(A) using the sparsest-fit 
algorithm for general A. Let A ~ (Ai,...,AT.),r > 2 with 
Ai > • • • > Ar > 1. As before, let 

i^ = ||/||o, supp(/) = Ke5„:l<fc<if}, 
and /(o-fc) ^PkA<k<K. 

Here ak and pk are randomly chosen as per the random model 
R{K, described in Section And, we are given partial 
information /(A) which is D\ x Dx matrix with 



Finally, recall definition a 

i < r. 



(a»)l 



with ai — Aj/n, 1 < 



H{a) 



^^a^logai, and H'{a) 



ai log Ui 



As usual, to establish that the sparsest-fit algorithm recovers / 
from /(A), we will need to establish "unique witness" property 
as "linear independence" is satisfied due to choice of p^s as 
per random model R{K^ '^). 

For the ease of exposition, we will need an additional 
notation of A-bipartite graph: it is a complete bipartite graph 
= {Vi X V2, E^) with vertices V-^^, V2 having a node each 
for a distinct A partition of n and thus \V^\ = \V^\ — Dx- 
Action of a permutation a G 5„, represented by a 0/1 valued 
Dx X Dx matrix, is equivalent to a perfect matching in G^. 
In this notation, a permutation cr has "unique witness" with 
respect to a collection of permutations, if and only if there is 
an edge in the matching corresponding to cr that is not present 
in any other permutation's matching. 

Let S'l denote the event that L > 2 permutations chosen 
uniformly at random satisfy the "unique witness" property. To 
establish Theorem IIII.6I we wish to show that Pi-{S'^) — o(l) 
as long as i^T < Kl{X) where ifj* (A) is defined as per Q. To 
do so, we shall study Pi{^£^j^\(ol) for L > 1. Now consider 
the bipartite graph, G^, which is subgraph of G^, formed by 



the superimposition of the perfect matchings corresponding 
to the L random permutations, ai,l < i < L. Now, the 
probability of S'l^i given that S'l has happened is equal to 
the probability that a new permutation, generated uniformly at 
random, has its perfect matching so that all its edges end up 
overlapping with those of G^. Therefore, in order to evaluate 
this probability we count the number such permutations. 

For the ease of exposition, we will first count the number of 
such permutations for the cases when A = (n — 1, 1) followed 
by A = (n — 2,2). Later, we shall extend the analysis to 
a general A. As mentioned before, for A = (n — 1,1), the 
corresponding G^ is a complete graph with n nodes on left and 
right. With a bit of abuse of notation, the left and right vertices 
be labeled l,2,...,n. Now each permutation, say cr e Sn, 
corresponds to a perfect matching in G'*' with an edge from 
left i to right j if and only if a{i) — j. Now, consider G;^, 
the superimposition of all the perfect matching of the given 
L permutations. We want to count (or obtain an upper bound 
on) the number of permutations that will have corresponding 
perfect matching so that all of its edges overlap with edges of 
G^. Now each permutation maps a vertex on left to a vertex 
on right. In the graph G'^, each vertex i on the left has degree 
of at most L. Therefore, if we wish a choose a permutation so 
that all of its perfect matching's edges overlap with those of 
G^, it has at most L choices for each vertex on left. There are 
n vertices in total on left. Therefore, total number of choices 
are bounded above by L". From this, we conclude that for 
A = (n- 1,1), 

In a similar manner, when A = (ri — 2, 2), the complete 
bipartite graph G^ has Dx = (2) nodes on the left and 
right; each permutation corresponds to a perfect matching in 
this graph. We label each vertex, on left and right, in G^ 
by unordered pairs {i,j}, for I < i < j < n. Again, 
we wish to bound given Pr{S'l_^_^\S'L). For this, let G^, a 
subgraph of G^, be obtained by the union of edges that 
belong to the perfect matchings of given L permutations. 
We would like to count the number possible permutations 
that will have corresponding matching with edges overlapping 
with those of G^. For this, we consider the [n/2\ pairs 
{I,2},{3,4},...,{2[n/2J - l,2[n/2j}. Now if n is even 
then they end up covering all n elements. If not, we consider 
the last, nth element, {n} as an additional set. 

Now using a similar argument as before, we conclude that 
there are at most lL"/2J ways of mapping each of these [n/2\ 
pairs such that all of these edges overlap with the edges of 
G^. Note that this mapping fixes what each of these [ri/2] 
unordered pairs get mapped to. Given this mapping, there 
are 2! ways of fixing the order in each unordered pair. For 
example, if an unordered pair {i,j} maps to unordered pair 
{k,l} there there are 2! = 2 options: : i ^ k, j ^ I or 
i ^ I, j ^ k. Thus, once we fix the mapping of each 
of the [n/2] disjoint unordered pairs, there can be at most 
(2!)r"/2l permutations with the given mapping of unordered 
pairs. Finally, note that once the mapping of these \n/2\ 
pairs is decided, if n is even that there is no element that 
is left to be mapped. For n odd, since mapping of the n — 1 
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elements is decided, so is that of {n}. Therefore, in summary 
in both even n or odd n case, there are at most iL'^/^J (2!) r"/^! 
permutations that have all of the edge of corresponding perfect 
matching in overlapping with the edges of G^. Therefore, 



2,L"/2J(2!)L"/2J 



Now consider the case of general A = (Ai, A2, . . . , A^). Let 
M = [n/{n - Ai)J and N = n - M{n - Ai). Clearly, < 
N < n — Xi. Now we partition the set {1, 2, . . . ,n} into M + 1 
partitions covering all elements: {1, . . . , n — Ai}, . . . , {(n — 
Ai)(A/-l) + l, . . . , (n-Ai)A/} and {(n- Ai)M + l, ...,n}. 
As before, for the purpose of upper bounding the number of 
permutations that have corresponding perfect matchings in 
overlapping with edges of G^, each of the first M partitions 
can be mapped in L different ways; in total at most i*^ ways. 
For each of these mappings, we have options at the most 

(A2!A3!...A,!)^^. 

Given the mapping of the first AI partitions, the mapping of 
the N elements of the A/ + lst partition is determined (without 
ordering). Therefore, the additional choice is at most A^!. In 
summary, the total number of permutations can be at most 



L 



M 



Using this bound, we obtain 



Let, 



A 1 .A/ 



-M 



(39) 



\i=2 



XL = -M' 



'^1=2 



Note that (ofc+i C Sk for fc > 1. Therefore, it follows that 

= VT{gK\SK-l)^^{SK-l). (40) 

Recursive application of argument behind (|40] | and fact that 
Pr((?i) — 1, we have 



Vy{Sk) = Pr(A) n 

L = l 

K-1 

L=l 

K-1 



L=l 



> 1- 

\L=1 / 



(41) 



Using ( [39] l. it follows that Xk+i > for fc > 1. Therefore, 

K 

E a^L < KxK 



L=2 



M 



Nl 



Df \Xi\J n! 



— vv -JJ-L(42) 



Since n ~ N + M{n — Ai), we have a binomial and a 
multinomial coefficient in RHS of (|42] |. We simplify this 
expression by obtaining an approximation for a multinomial 
coefficient through Stirling's approximation. For that, first 
consider a general multinomial coefficient m\/{ki\k2\ . . . kil) 
with m — '^j^ki- Then, using the Stirling's approximation 
log nl — n log n — n + 0.5 log n + 0(1), for any n, we obtain 



log 



^kilk2l...kil^ 

TO log m — TO + 0.5 log TO + 0(1) — 
E ih logfc, -h+ 0.5 log fc, + 0(1)) 



i=l 

ki m 

— log - — I- 0.5 log 

TO kj 



1=1 

Thus, we can write 



kik2 .. - h 



0{l) 



M log ■ 



7.1 



Ai!(n-Ai)! 

= Mnailog h Mn(l — oil) log 

ai 1 — ai 

where ai ~ Xi/n. Similarly, we can write 

nl 

log 



(43) 



A!((n- Ai)!)*^ 



nS\og \ + Mn{l ~ ai) log - — - — 

1 — ai 



(44) 



0.5 log- 



1 



0{M) 



nA^(5(l-ai)A^ 
where 6 = N/n. It now follows from ( l43T l and ( |44] | that 



M log ■ 



log- 



Ai!(n-Ai)! ''7V!((n-Ai)!)^ 
= — Mnai log ai + Sn log S (45) 

+ 0.51og— + 0(Af) 
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Since (5 < 1, (Jnlog^ < and \og{S/a'^) < -Mlogai. 
Thus, we can write 



M log ■ 



log- 



Ai!(n-Ai)! ""^7V!((n- Ai)!)*^ 
< Mnai log(l/ai) + 0(M log(l/ai)) 
= 0(Mnailog(l/ai)) (46) 

It now follows from (02]), (gUl and (|46ll that 




<(M + 1) log if - M log Dx + 0(Mnai log(l/ai)) (47) 

Therefore, for Pr((oA') = 1 ^ a sufficient condition is 

clog n 



<- 



logK 
M 



M + 1 
\ogDx- 



M 



M+1 



0{nai log(l/Q;i)) 



(48) 



M + 1 

for some c > 0. We now claim that logn — 
0{Mnai\og{l/ai)). The claim is clearly true for ai ^ 6 
for some < < 1. Now suppose ai — > 1. Then, 
M > 1/(1 — ai) — 1 = q;i/(1 — Q!i) = X, say. This implies that 
Mailog(l/ai) > aia::log(l + 1/x) — > 1 as ai — ^ 1. Thus, 
Mnai log(l/ai) — n{l + o(l)) for ai — > 1 as n — > oo. 
Hence, the claim is true for ai — > 1 as ?i — > oo. Finally, 
consider ai — )■ as n — >■ oo. Note that the function h{x) = 
x\og{l/x) is increasing on (0, e) for some < e < 1. Thus, 
for n large enough, nai log(l/ai) > logn since ai > 1/n. 
Since M > 1, it now follows that Afnai log(l/ai) > logn 
for n large enough and ai 0. This establishes the claim. 
Since logn = 0(Mnai log(l/ai)), it now follows that 
is implied by 



logK < 



M 



M + 1 

M 
M + 1 



log Da 



M 



-0{nai Iog(l/Q!i)) 



\ogDx 



M+1 

Oinai log(l/ai)) 



(49) 



Now consider D\ 
that for large n 



n!/(Ai!A2! 



logI?A 
. Ar!). Then, we claim 



log Da > 0.5ni?(a) 



(50) 



Since \ < n/2 for i > 2, log(n/Ai) > log 2. Thus, the first 
term in the RHS of ( fSTl ) is non-negative for any A^ > 1. In 
addition, for every A^, either A^ — log A^ — > oo or log(n/Ai) ^• 
oo as n — oo. Therefore, the term on the RHS of ( BTT l is 
asymptotically non-negative. Hence, 

log Da > 0.5nH{a). (52) 

Thus, it now follows from dSOl l that ( |49] l is implied by 



losK < 



M 



M+1 



\ogD 



1 



0(ailog(l/ai)) 



H{a) 



That is, we have "unique witness" property satisfied as long 
as 



K = O D 



-)7(a) 



where 



7(a) 



M 



M+1 



1-C" 



,H{a)-H'{a) 



H{a) 



(53) 



(54) 



In order to see why the claim is true, note that Stirling's 
approximation suggests, 

logn! = nlogn — n + 0.51ogn + 0(1), 
log A,! = A.logA, -A. +0.51ogA, + 0(1). 

Therefore, 

r 

log Da > nff(a) + 0.5 log(n/Ai)-^ 0.5(0(1)+ log A,) 



and C is some constant. This completes the proof of Theorem 

MM 

XI. Proof of Theorem |III.7| Limitation on Recovery 

In order to make a statement about the inability of any 
algorithm to recover / using /(A), we rely on the formalism 
of classical information theory. In particular, we establish 
a bound on the sparsity of / beyond which recovery is 
not asymptotically reliable (precise definition of asymptotic 
reliability is provided later). 

A. Information theory preliminaries 

Here we recall some necessary Information Theory prelim- 
inaries. Further details can be found in the book by Cover and 
Thomas I.26J . 

Consider a discrete random variable X that is uniformly 
distributed over a finite set . Let X be transmitted over 
a noisy channel to a receiver; suppose the receiver receives 
a random variable Y , which takes values in a finite set 
W . Essentially, such "transmission over noisy channel" setup 
describes any two random variables X, Y defined through a 
joint probability distribution over a common probability space. 

Now let X — g{Y) be an estimation of the transmitted in- 
formation that the receiver produces based on the observation 
Y using some function g : . Define probability of 

error as pen- — Vy{X ^ X). Since X is uniformly distributed 
over 3^ , it follows that 

P^rr = T^y, Pr(g(r) ^ x\x). (55) 



i=2 



Now consider. 



A, log(n/A,)-logA, -0(1) 
log Ai 



A, 



log(n/Ai) 



log(n/A,) - 0(1) 



(51) 



Recovery of X is called asymptotically reliable if — > 
as — oo. Therefore, in order to show that recovery is 
not asymptotically reliable, it is sufficient to prove that pen 
is bounded away from as \^\ — ?> oo. In order to obtain a 
lower bound on perr, we use Fano's inequality: 



H{X\X) < l+PeiTl0g|^| 



(56) 
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Using ( |56] l. we can write 

H{X) = I{X-X) + H{X\X) 

< I{X;X)+p,„log\.r\ + l 

< I{X;Y)+p,„\og\^\ + l 

= H{Y)-H{Y\X)+p,n-log\^\ + l 

< i/(r)+perrl0g|^| + l, (57) 

where we used H{Y\X) > for a discret^ valued random 
variable. The inequality (a) follows from the data processing 
inequality: if we have Markov chain X — > F — > X, then 
IiX:X) < I{X;Y). Since H{X) = log\^\, from dSTj we 
obtain 

> 1-^<I^, ,58) 

log| 

Therefore, to establish that probability of error is bounded 
away from zero, it is sufficient to show that 

H{Y) + 1 



logl^l 

for any fixed constant 6 > 0. 



< 1-S, 



(59) 



B. Proof of theorem MIL 71 

Our goal is to show that when K is large enough (in 
particular, as claimed in the statement of Theorem IIII.7I ), the 
probability of error of any recovery algorithm is uniformly 
bounded away from 0. For that, we first fix a recovery algo- 
rithm, and then utilize the above setup to show that recovery 
is not asymptotically reliable when K is large. Specifically, 
we use ( |59] |. for which we need to identify random variables 
X and Y. 

To this end, for a given K and T, let / be generated as per 
the random model R{K^ T). Let random variable X represent 
the support of function / i.e., X takes values \n 3^ — . 
Given A, let /(A) be the partial information that the recovery 
algorithm uses to recover /. Let random variable Y represent 
/(A), the D\ X D\ matrix. Let h = h{Y) denote the estimate 
of /, and g = g{Y) — supp/i denote the estimate of the 
support of / produced by the given recovery algorithm. Then, 



Pr{h^f) > Pr(supp/i^supp/) 
= FT{giY)^X). 



(60) 



Therefore, in order to uniformly lower bound the probability of 
error of the recovery algorithm, it is sufficient to lower bound 
its probability of making an error in recovering the support of 
/. Therefore, we focus on 

Pe„ = Pr(.g(n ^^)- 

It follows from the discussion in Section IXI- Al that in order 
to show that peir is uniformly bounded away from 0, it is 
sufficient to show that for some constant 6 > 
H{Y) + 1 



log|S-| 



< 1-6. 



(61) 



*The counterpart of this inequality for a continuous valued random variable 
is not true. This led us to study the limitation of recovery algorithm over model 
R{K,T) rather than R(K,V). 



Observe that \^\ = (n!)^. Therefore, using Stirling's ap- 
proximation, it follows that 



iog|Jr| 



(1 + o{l))Kn\ogn. 



(62) 



Now Y = /(A) is a Da X Dx matrix. Let Y = [Yij] with 
Yij, 1 < i,j < D\, taking values in {1, . . . , KT}; it is easy 
to see that HiY^j) < log KT. Therefore, it follows that 

Da 

< DllogKT = Dl{\ogK + logT). (63) 

For small enough constant (5 > 0, it is easy to see that the 
condition of i6T[ will follow if K satisfies the following two 
inequalities: 



Dl log if 1 , 

^ , ^ <-il + S) 
Kn log n 3 

^ , ^ < -(1 + 5) 
Kn log n 3 



K ^ 3{l~6/2)D 



logK 



nlogn 



(64) 



K > — ^ ^ . (65) 



nlogn 

In order to obtain a bound on K from ( l64l i. consider the fol- 
lowing: for large numbers x,y,lety — {c+e)x log x, for some 
constants c,e>0. Then, logy ~ loga; + logloga; + log(c + e) 
which is (1 + o(l)) log a:. Therefore, 

y _ c + e 



logy 



l + o(l) 



X > ex. 



(66) 



for X — > oo and constants c, £ > 0. Also, observe that y/ logy 
is a non-decreasing function; hence, it follows that for y > 
(c+e)xloga;, y/ logy > cx for large x. Now take x = -^^y^^, 
c = 3, e = 1 and y = K. Note that D\ > n for all A of 
interest; therefore, x - 
for the choice of 

K > 



oo as n — > oo. Hence, 



is satisfied 



4Dl 



log 



n log n 



(67) 



n log n 

From dMT ), ( l64b . ( l65b . and ( l67b it follows that the probability 
of error of any algorithm is at least 5 > Q for n large enough 
and any A if 



K > 



JDl_ 

nlogn 



log 



nlogi 



V T 



(68) 



This completes the proof of Theorem IIII.7I 

XII. Conclusion 

In summary, we considered the problem of exactly recov- 
ering a non-negative function over the space of permutations 
from a given partial set of Fourier coefficients. This problem 
is motivated by the wide ranging applications it has across 
several disciplines. This problem has been widely studied in 
the context of discrete-time functions in the recently popular 
compressive sensing literature. However, unlike our setup, 
where we want to perform exact recovery from a given set of 
Fourier coefficients, the work in the existing literature pertains 
to the choice of a limited set of Fourier coefficients that can 
be used to perform exact recovery. 

Inspired by the work of Donoho and Stark IT] in the context 
of discrete-time functions, we focused on the recovery of 
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non-negative functions with a sparse support (support size <C 
domain size). Our recovery scheme consisted of finding the 
function with the sparsest support, consistent with the given 
information, through £o optimization. As we showed through 
some counter-examples, this procedure, however, will not 
recover the exact solution in all the cases. Thus, we identified 
sufficient conditions under which a function can be recovered 
through io optimization. For each kind of partial information, 
we then quantified the sufficient conditions in terms of the 
"complexity" of the functions that can be recovered. Since 
the sparsity (support size) of a function is a natural measure 
of its complexity, we quantified the sufficient conditions in 
terms of the sparsity of the function. In particular, we pro- 
posed a natural random generative model for the functions 
of a given sparsity. Then, we derived bounds on sparsity for 
which a function generated according to the random model 
satisfies the sufficient conditions with a high probability as 
n — > oo. Specifically, we showed that, for partial information 
corresponding to partition A, the sparsity bound essentially 



scales as D 



M/{M+1) 



For Xi/n -> 1, this bound essentially 
becomes D\ and for Ai /n — 0, the bound essentially becomes 



Even though we found sufficient conditions for the re- 
coverability of functions by finding the sparsest solution, 
optimization is in general computationally hard to carry out. 
This problem is typically overcome by considering its convex 
relaxation, the optimization problem. However, we showed 
that optimization fails to recover a function generated by 
the random model with a high probability. Thus, we proposed 
a novel iterative algorithm to perform optimization for 
functions that satisfy the sufficient conditions, and extended it 
to the general case when the underlying distribution may not 
satisfy the sufficient conditions and the observations maybe 
noisy. 

We studied the limitation of any recovery algorithm by 
means of information theoretic tools. While the bounds we 
obtained are useful in general, due to technical limitations, 
they do not apply to the random model we considered. Closing 
this gap and understanding recovery conditions in the presence 
of noise are natural next steps. 
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Appendix 
Proof of Auxiliary Lemma 

Here we present the proof of Lemma IIII.ll For this, first 
consider the limit ai f 1- Specifically, let ai = 1 — e, for 



very small positive e. Then, X^[^2 cki = 1 — ai = £■ By 
definition, we have H'{a)/H{a) < 1; therefore, in order to 
prove that H'{a) / H{a) — > 1 as ai f 1, it is sufficient to prove 
that H'{a)/H{a) > 1 - o(l) as ai t 1. For that, consider 

H'{a) _ H'{a) 



Hia) 



= 1 - 



ailog(l/ai) -|-i?'(a) 
ai log(l/Q;i) 



ai log(l/ai) + i7'(a)' 



(69) 



In order to obtain a lower bound, we minimize H'{a)/H{a) 
over a > 0. It follows from (|69] l that, for a given ai — 1 — e, 
H' {a)/ H{a) is minimized for the choice of ai,i > 2 that 
minimizes H'{a). Thus, we maximize X]I=2 logc^i subject 
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to tti > and X]i=2 "^^ ~ 1 — ai — e. Here we are 
maximizing a convex function over a convex set. Therefore, 
maximization is achieved on the boundary of the convex set. 
That is, the maximum is e log e; consequently, the minimum 
value of H'{a) — elog(l/e). Therefore, it follows that for 
ai = 1 — £, 

1 > ES^ > . -(l-e)log(l~e) 



H{a) - elog(l/e) - (1 -e)log(l -e) 

1 



1 



£log(l/£) + e 

1 



1 + log(l/£) 

1. (70) 



To prove a similar claim for ai \, 0, let ai — e for a 
small, positive e. Then, it follows that r — Vl{\/e) since 
Si=i ctj = 1 ™d ai > ai for all i, 2 < i < r. Using a 
convex maximization based argument similar to the one we 
used above, it can be checked that H'{a) = Vl{\og{l / e)) . 
Therefore, it follows that ai \og{l / ai) / H' {a) as ai | 0. 
That is, H' [a) / H{a) — >■ 1 as ai | 0. This completes the proof 
of Lemma IIII.ll 
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